Sharp Detection in Pca under Correlations: All Eigenvalues Matter
نویسنده
چکیده
Principal component analysis (PCA) is a widely used method for dimension reduction. In high dimensional data, the “signal” eigenvalues corresponding to weak principal components (PCs) do not necessarily separate from the bulk of the “noise” eigenvalues. Therefore, popular tests based on the largest eigenvalue have little power to detect weak PCs. In the special case of the spiked model, certain tests asymptotically equivalent to linear spectral statistics (LSS)—averaging effects over all eigenvalues—were recently shown to achieve some power. We consider a nonparametric “local alternatives” generalization of the spiked model to the setting of Marchenko and Pastur (1967). This allows a general correlation structure even under the null hypothesis of no significant PCs. We develop new tests to detect weak PCs in this model. We show using the CLT for LSS that the optimal LSS satisfy a Fredholm integral equation of the first kind. We develop algorithms to solve it, building on our recent method for computing the limit empirical spectrum. Our analysis relies on the new concept of the weak derivative of the Marchenko-Pastur map of eigenvalues, which also leads to a new perspective on phase transitions in spiked models.
منابع مشابه
APPLICATION OF THE RANDOM MATRIX THEORY ON THE CROSS-CORRELATION OF STOCK PRICES
The analysis of cross-correlations is extensively applied for understanding of interconnections in stock markets. Variety of methods are used in order to search stock cross-correlations including the Random Matrix Theory (RMT), the Principal Component Analysis (PCA) and the Hierachical Structures. In this work, we analyze cross-crrelations between price fluctuations of 20 company stocks...
متن کاملCorrelation of Data Reconstruction Error and Shrinkages in Pair-wise Distances under Principal Component Analysis (PCA)
In this ‘on-going’ work, I explore certain theoretical and empirical implications of data transformations under the PCA. In particular, I state and prove three theorems about PCA, which I paraphrase as follows: 1). PCA without discarding eigenvector rows is injective, but looses this injectivity when eigenvector rows are discarded 2). PCA without discarding eigenvector rows preserves pair-wise ...
متن کاملFinite Sample Approximation Results for Principal Component Analysis: a Matrix Perturbation Approach
Principal Component Analysis (PCA) is a standard tool for dimensional reduction of a set of n observations (samples), each with p variables. In this paper, using a matrix perturbation approach, we study the non-asymptotic relation between the eigenvalues and eigenvectors of PCA computed on a finite sample of size n, to those of the limiting population PCA as n → ∞. As in machine learning, we pr...
متن کاملFinite Sample Approximation Results for Principal Component Analysis: a Matrix Perturbation Approach1 by Boaz Nadler
Principal component analysis (PCA) is a standard tool for dimensional reduction of a set of n observations (samples), each with p variables. In this paper, using a matrix perturbation approach, we study the nonasymptotic relation between the eigenvalues and eigenvectors of PCA computed on a finite sample of size n, and those of the limiting population PCA as n→∞. As in machine learning, we pres...
متن کاملAn Image Splicing Detection Method Based on PCA Minimum Eigenvalues
This paper presents a novel and effective image splicing forgery detection method based on the inconsistency of irrelevant components between the original and the tampered regions. The specific irrelevant components can be described by the minimum eigenvalues obtained by the principal component analysis (PCA) without knowing any prior information. To avoid the impact of local structures, a pixe...
متن کامل